Introduction to R and RStudio

Introduction

In this tutorial you will learn about one way that you can use RStudio to interact with R–working at the R console. We will also introduce package installation and talk about how to get help in R. Along the way we will introduce some useful things like defining variables and making basic R plots.

The R console

Start by opening RStudio. You should see something similar to the following screenshot. The pane on the left is the R console. Note that each pane has multiple tabs so you can change, for example, from the console to a terminal by selecting the Terminal tab. You will almost always have the R console in this pane. When you open RStudio, information about the version of R you are using will be printed in the console.

Layout of RStudio, including R console (left).

Using R as a calculator

One of the primary ways you will interact with R is through the console. You can type R commands at the prompt (the orange >) and hit return (or enter) to run the command. Let’s try a simple command. Type 2+2 at the prompt and hit return.

Using R as a calculator.

R does the calculation for us and prints out the result. What if we type an incomplete command? Try typing 2 + and then hit return. This command is incomplete because we have not told R what it should add to 2.

An incomplete command. R waits for you to complete it.

This time instead of giving us a result R prints +, indicating that it expecting additional information so that it can complete the calculation. We can complete the command by typing 2 then hitting return. R realizes that the command is complete and performs the calculation. The fact that R recognizes incomplete commands and waits for them to be completed (instead of treating them as errors) is something we can take advantage of if we want to enter long commands in the console. Later, we will take advantage of this behavior to format our code in other contexts as well.

Completing the command.

What if we actually make a mistake? Let’s try entering a command that R does not know how to execute. Try entering 2 + two at the prompt. R prints an informative error message. R does not recognize the word two, so it tells you that there was an error and that it could not find an object called two.

What is this “two” you speak of? Error messages in R.

Installing packages

Packages extend the functionality of R beyond what is included in a base installation. An R package might allow you to import data from a google sheet, or it might add statistical tests that are not included in base R. Packages are written by other R users and made available to the R community. There are MANY R packages available. If you are working on a problem, it is often worth checking to see if someone has developed a package that will help you with your problem. Some day you may write an R package!

One package (actually a collection of several packages) we will use quite a bit is the Tidyverse package. The packages in the tidyverse include a lot of helpful tools for importing and working with data. It is usually possible to solve a problem multiple ways in R. Approaches that use tidyverse functions are often more efficient than others, and the resulting code is usually easier to read than with other approaches. If you are trying to solve a problem, I recommend starting out by looking for solutions within the tidyverse.

To show you how to install packages, and so that you have it installed so you can use it later, we will install the Tidyverse package.

There are two common ways to install packages in RStudio. One way is to use the Tools menu. Another is to install packages using the R console. To install the tidyverse package using the R console, simply enter the command install.packages('tidyverse'). You need to be connected to the internet, because R will download the package from the internet and install it.

Installing the Tidyverse package.

Getting help

One of the most useful features of the R console is that you can use it to get help. If you want to know more about a particular R command, you can enter ?[command], where [command] is the name of the command. For example, if we need help using the install.packages command we would enter ?install.packages. In response, information about the command appears in the Help tab in the botttom right pane of the RStudio window.

Asking R for help.

Going beyond simple calculations in R

So far we have not done much in R that you couldn’t already do with a basic calculator. R is extremely versatile and powerful, and we want to take advantage of its ability to help us solve complex problems. To do this requires learning some more R commands and about how R represents data. Later we will learn how to string multiple commands together in a script to solve complex problems in a reproducible way.

We will start by introducing a few fundamental things that we will use over and over again as we work with R.

Variables

When working with real data sets (even small ones) it quickly becomes tedious to enter the same information into R again and again to do computations with the data (even taking advantage of copying and pasting). Suppose for example, you wanted to compute the mean, median, and standard deviation of a list of numbers. We don’t want to type the same list of numbers into the console each time we want to do a computation with it. Instead it is useful to use a variable name to refer to the list and to use that variable name to stand in for our list each time we do a computation with it.

Let’s start simple. Suppose we want to use the variable a to refer to the number 2. We can assign the value 2 to a using the assignment arrow <- in R. Enter the command a <- 2 into the console. You should take note of two things: First, R does not return anything when you do this. The value 2 is assigned to a, and R has nothing to say about it. Second, the top-right pane in the RStudio window now shows that a has the value 2. This is in the Environment tab. This tab keeps track of the variables that you have defined in your R session. (If you restart R, it will forget this information, so don’t expect R to think of 2 every time you type a forever more.)

Defining a variable in R.

You can have R print out the value of a by simply entering the variable name a at the prompt.

Printing the value of a.

Of course, the real advantage is we can now use a in calculations. Try entering a+a at the console.

Doing calculations with variables.

We can even store the result of a calculation using variables as a new variable. Try entering b <- a + a and then print out the value of b by entering b. What changed in the Environment tab?

Using a variable to store the results of a calculation.

Vectors

When we are working with data, we usually work with more than one value at a time. R uses vectors to store lists of data that have values of a single type (e.g., all of the values are numeric). We can also use variables to refer to vectors. There are many ways to construct vectors in R. The simplest is to construct a vector manually using the c command (short for combine or concatenate).

Let’s build a vector using the c command. We will give the vector a name, u, so that we can refer to it later. Try entering the command u <- c(0, 0.25, 0.5, 0.75, 1). Then enter u to print the value of u. What changed in the Environment tab?

Defining a vector manually using the c function.

We can do calculations with vectors. Some commands will do the same calculation for every element of the vector. For example, we can add 2 to each element of u by entering u + 2. Note that the vector u is not changed by the calculation. R simply prints the results to the screen. To change the values of u we would need to assign the results to the variable u. This could be achieved with the command u <- u + 2 (not shown).

Adding 2 to all the elements of u

Other commands will use all of the values in the vector to compute a single summary value. For example, we could use the mean function to compute the mean of the values in the vector u. Try entering mean(u).

Calculating the mean of the values in a vector.

We can also do calculations that involve more than one vector. If we define another vector with the same length as u (let’s call the new vector v), then we can add u and v together. The result is a vector of the same length whose elements are the sumes of the corresponding elements of u and v. Try entering v <- c(-1, 0, 1, 0, -1) and then u + v.

Adding two vectors.

Much of the time we will not want to enter the values of a vector manually. Some times we will read the values in from an existing data set (e.g., from a text file). We will discuss this in the next tutorial. There are also commands to construct vectors automatically if they have certain simple structures. For example, the vector u we constructed earlier contains equally spaced numerical data from 0 to 1. Let’s automatically construct a similar vector with even smaller spacing between numbers. We will use the seq command (short for sequence). Try entering x <- seq(0, 1, by = 0.1). Then enter x to print the results. What is the spacing between consecutive numbers? How would we specify different spacing? A different starting value? Ending value? Remember that you can learn more about seq by entering ?seq (not shown).

Building a vector using seq

Basic plotting

One of the most useful features of R is its ability to create nice plots. In this class we will usually use commands from the package ggplot2 (part of the tidyverse) to create plots. However, some times it is useful to generate quick plots using the plotting tools in base R. We will illustrate this by creating a simple plot using the plot command.

R typically creates plots from data. What this means is that instead of asking R to plot the function \(y=x^2\) like you might do with a graphing calculator or something like Desmos or Wolfram Alpha, you need to define vectors x and y with the data you want to plot. We have already defined a suitable vector x above. Let’s define the vector y so that it’s elements are just the squares of the corresponding elements of y. Then x and y will have the desired relationship. To this end, enter y <- x^2. Then enter y to print out its values so you can inspect them.

Defining a new vector y so that \(y=x^2\).

Now that we have our data we are ready to create our first plot using R! We will produce a scatter plot using x for our x values and y for our y values. Try entering the command plot(x, y). The plot will appear in the Plots tab in the lower-right pane in the RStudio window.

Basic plotting in R.

By default, R plots the data as a scatter plot. We can change the plot type to create a line plot instead (R draws line segments between the points). Try the command plot(x, y, type = 'l').

Using lines instead.

We have a lot of flexibility when it comes to building plots in R, even just within base R. For example, we can plot points and then add the line segments by entering plot(x, y) then lines(x, y).

Adding lines to a scatter plot.

We can further customize our plots by specifying axis labels, markers, colors, etc. You can explore the documentation for the plot function using ?plot or you can search online for more information.

Customizing an R plot.